Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle casts done via helpers and fold overflow operations in value numbering #50450

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Mar 30, 2021

Contributes to #49535, #1115 and #32776.

Implements the VN part of #1115 (comment), also enables value numbering of casts that are done via helpers.

There are three major parts to this PR.

  1. Refactoring of gtFoldExprConst and the creation of CheckedOps namespace so that the overflow checking can live in one place and be shared between the aforementioned gtFoldExprConst and VN. I also generally cleaned up the folding function as per the style guide. Notably, the behavior for overflowing unchecked casts from floating point types to integer types has been preserved - it will have to wait until Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478 is resolved. As the handling for the floating point -> integer overflow checks was a bit simplistic, I had to implement it from scratch, with some commentary on the choice of constants.

  2. Unblocking of VN folding the checked operations when the arguments have a known constant VN, including avoiding the addition of exceptions to the exception set for such cases and the addition of relevant asserts - only for the checked cases.

  3. Conversion of all VNFuncs that represented casts performed via helpers to plain VNF_Cast/Ovfs. This is the part of the change I am least confident in, at the same time, it avoids special-casing the helpers in folding and streamlines the handling in general. There is an associated TODO on how this could be avoided if transforming the relevant GT_CASTs to the relevant helpers could be done by the backend. I do not know how feasible would that be.

The added tests have been generated via a script (the #if controls which part is generated - for arithmetic or for casts).

Draft for now to see what the CI thinks.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 30, 2021
@SingleAccretion SingleAccretion force-pushed the Fold-Overflow-Operations-In-Value-Numbering branch 9 times, most recently from e09888e to 74605e0 Compare March 31, 2021 19:45
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Mar 31, 2021

Work that needs to be done before this can be ready for review:

  • Fold the helpers for casts in morph - gtFoldExprConst(gtNewCastNode()) is probably acceptable given these are very cold paths.
  • Split the test for casts into multiple files so that it can be reviewed.
  • Make sure the cases in the referenced issues are optimized as expected.
  • Report and analyze the final diffs via SPMI, diff the CoreLib without Debug.Asserts with PMI.
  • Evaluate the feasibility of avoiding regressions - describe the assertion prop issue with nested falsely side-effecting trees.
  • Make the CI green...

Optional: diff a CoreLib compiled with -checked.

@SingleAccretion SingleAccretion force-pushed the Fold-Overflow-Operations-In-Value-Numbering branch 2 times, most recently from 23aacfc to 52649d5 Compare April 3, 2021 21:07
@SingleAccretion
Copy link
Contributor Author

So, what does this PR achieve?

Not much.

The case in #49535 is not actually related (it was just me not reading things correctly...), but there is some improvement in the targeted area.

The case in #1115 is optimized as expected. I expect this work to contribute to MemoryMarshal.Cast's performance in cases where sizeof(TFrom) >= sizeof(TTo) or length known, such as in #32776.

The SPMI diffs are generally positive:

And also tiny
benchmarks.run.windows.x64.checked.mch

Generated asm is located under C:\Users\Accretion\source\dotnet\runtime\artifacts\spmi\asm.windows.x64.Checked.11\base C:\Users\Accretion\source\dotnet\runtime\artifacts\spmi\asm.windows.x64.Checked.11\diff

Top method regressions (bytes):
           8 ( 2.62% of base) : 16235.dasm - Benchstone.BenchF.BenchMrk:Test():bool:this
           6 ( 0.14% of base) : 2665.dasm - Utf8Json.Formatters.ISO8601DateTimeOffsetFormatter:Deserialize(byref,Utf8Json.IJsonFormatterResolver):System.DateTimeOffset:this

Top method regressions (percentages):
           8 ( 2.62% of base) : 16235.dasm - Benchstone.BenchF.BenchMrk:Test():bool:this
           6 ( 0.14% of base) : 2665.dasm - Utf8Json.Formatters.ISO8601DateTimeOffsetFormatter:Deserialize(byref,Utf8Json.IJsonFormatterResolver):System.DateTimeOffset:this

libraries.pmi.windows.x64.checked.mch

Total bytes of base: 7351
Total bytes of diff: 7208
Total bytes of delta: -143 (-1.95% of base)
    diff is an improvement.

Top method regressions (bytes):
          26 ( 3.03% of base) : 147560.dasm - Microsoft.VisualBasic.FileIO.FileSystem:FileContainsText(System.String,System.String,bool):bool

Top method improvements (bytes):
         -77 (-15.34% of base) : 169396.dasm - System.Diagnostics.ActivityTraceId:.ctor(System.ReadOnlySpan`1[Byte]):this
         -50 (-16.03% of base) : 210169.dasm - System.Security.Cryptography.RandomNumberGenerator:GetInt32(int,int):int
         -26 (-6.57% of base) : 166994.dasm - System.Data.Odbc.OdbcConnection:GetInfoStringUnhandled(ushort,bool):System.String:this
         -16 (-0.94% of base) : 215662.dasm - System.Security.Principal.WindowsIdentity:.ctor(System.String):this

Top method regressions (percentages):
          26 ( 3.03% of base) : 147560.dasm - Microsoft.VisualBasic.FileIO.FileSystem:FileContainsText(System.String,System.String,bool):bool

Top method improvements (percentages):
         -50 (-16.03% of base) : 210169.dasm - System.Security.Cryptography.RandomNumberGenerator:GetInt32(int,int):int
         -77 (-15.34% of base) : 169396.dasm - System.Diagnostics.ActivityTraceId:.ctor(System.ReadOnlySpan`1[Byte]):this
         -26 (-6.57% of base) : 166994.dasm - System.Data.Odbc.OdbcConnection:GetInfoStringUnhandled(ushort,bool):System.String:this
         -16 (-0.94% of base) : 215662.dasm - System.Security.Principal.WindowsIdentity:.ctor(System.String):this

tests.pmi.windows.x64.checked.mch

Total bytes of diff: 2022846
Total bytes of delta: -15880 (-0.78% of base)
    diff is an improvement.

Top file regressions (bytes):
          84 : 87204.dasm (46.41% of base)
          70 : 87176.dasm (225.81% of base)
          56 : 87208.dasm (13.18% of base)
          29 : 86540.dasm (9.73% of base)
          27 : 85086.dasm (3.55% of base)
          25 : 86775.dasm (16.23% of base)
          20 : 85154.dasm (2.29% of base)
          20 : 87201.dasm (10.31% of base)
          19 : 87141.dasm (38.78% of base)
          18 : 85122.dasm (1.47% of base)
          17 : 87153.dasm (30.91% of base)
          16 : 86955.dasm (9.82% of base)
          14 : 245769.dasm (21.21% of base)
          14 : 245771.dasm (20.00% of base)
          13 : 86694.dasm (12.15% of base)
          13 : 133514.dasm (3.01% of base)
          12 : 133857.dasm (1.05% of base)
          11 : 245770.dasm (17.46% of base)
          11 : 245772.dasm (17.46% of base)
          10 : 87341.dasm (58.82% of base)

Top file improvements (bytes):
       -1122 : 222352.dasm (-3.41% of base)
       -1116 : 222362.dasm (-3.31% of base)
       -1032 : 215606.dasm (-0.75% of base)
        -930 : 249053.dasm (-43.70% of base)
        -592 : 215586.dasm (-0.50% of base)
        -576 : 215616.dasm (-0.42% of base)
        -452 : 190444.dasm (-0.32% of base)
        -422 : 215576.dasm (-0.35% of base)
        -332 : 190297.dasm (-0.25% of base)
        -317 : 215533.dasm (-1.16% of base)
        -317 : 215539.dasm (-1.16% of base)
        -301 : 222324.dasm (-1.20% of base)
        -289 : 222314.dasm (-1.18% of base)
        -268 : 190464.dasm (-0.19% of base)
        -188 : 87197.dasm (-39.50% of base)
        -184 : 215596.dasm (-0.13% of base)
        -169 : 87210.dasm (-49.85% of base)
        -159 : 190357.dasm (-0.12% of base)
        -148 : 85133.dasm (-13.21% of base)
        -136 : 85073.dasm (-12.34% of base)

290 total files with Code Size differences (256 improved, 34 regressed), 24 unchanged.

Top method regressions (bytes):
          84 (46.41% of base) : 87204.dasm - DevDiv_590772:ILGEN_METHOD(long,ushort,long,int,ushort,long):byte
          70 (225.81% of base) : 87176.dasm - ILGEN_CLASS:ILGEN_METHOD(long,short,int):long
          56 (13.18% of base) : 87208.dasm - DevDiv_605447:ILGEN_METHOD(int,int,int,ushort,float):ushort
          29 ( 9.73% of base) : 86540.dasm - ILGEN_0x497ea5a8:main():int
          27 ( 3.55% of base) : 85086.dasm - ILGEN_0x3aa9c940:main():int
          25 (16.23% of base) : 86775.dasm - ILGEN_0x68eb95f0:Method_0x2dae(byte,int,long,byte,ushort,long):float
          20 ( 2.29% of base) : 85154.dasm - ILGEN_0x1bd95bae:Method_0x40637edd(int,byte,short,short,double,float,ubyte,float):short
          20 (10.31% of base) : 87201.dasm - DevDiv_578217.Program:ILGEN_METHOD(ushort,float,long,ushort,ubyte,short,int):short
          19 (38.78% of base) : 87141.dasm - C:M(bool,short,short,int):short
          18 ( 1.47% of base) : 85122.dasm - ILGEN_0xbc077bd:Method_0xa67f86e1(long,double,short,int,float,int,short,int,int):int
          17 (30.91% of base) : 87153.dasm - C:M():float
          16 ( 9.82% of base) : 86955.dasm - ILGEN_0x981b6a55:Method_0x9d35bca7():float
          14 (21.21% of base) : 245769.dasm - OVFTest:Test_sbyte():byte
          14 (20.00% of base) : 245771.dasm - OVFTest:Test_short():short
          13 (12.15% of base) : 86694.dasm - ILGEN_0x11c02e62:Method_0x7f2e741b(ubyte,ubyte,float,float,int):int
          13 ( 3.01% of base) : 133514.dasm - testout1:Func_0_4_3_1_6():long
          12 ( 1.05% of base) : 133857.dasm - testout1:Func_0_6_1_6_4():float
          11 (17.46% of base) : 245770.dasm - OVFTest:Test_byte():ubyte
          11 (17.46% of base) : 245772.dasm - OVFTest:Test_ushort():ushort
          10 (58.82% of base) : 87341.dasm - _simple:test():long

Top method improvements (bytes):
       -1122 (-3.41% of base) : 222352.dasm - lclfldrem:Main():int
       -1116 (-3.31% of base) : 222362.dasm - lclfldrem:Main():int
       -1032 (-0.75% of base) : 215606.dasm - r4rem:Main():int
        -930 (-43.70% of base) : 249053.dasm - ILGEN_0x1d013582:Method_0x7ef63454():long
        -592 (-0.50% of base) : 215586.dasm - r8div:Main():int
        -576 (-0.42% of base) : 215616.dasm - r8rem:Main():int
        -452 (-0.32% of base) : 190444.dasm - u4rem:Main():int
        -422 (-0.35% of base) : 215576.dasm - r4div:Main():int
        -332 (-0.25% of base) : 190297.dasm - u4div:Main():int
        -317 (-1.16% of base) : 215533.dasm - r8NaNrem:Main():int
        -317 (-1.16% of base) : 215539.dasm - r8NaNrem:Main():int
        -301 (-1.20% of base) : 222324.dasm - lclflddiv:Main():int
        -289 (-1.18% of base) : 222314.dasm - lclflddiv:Main():int
        -268 (-0.19% of base) : 190464.dasm - u8rem:Main():int
        -188 (-39.50% of base) : 87197.dasm - DevDiv_545500:Test(float,int,long):ubyte
        -184 (-0.13% of base) : 215596.dasm - i8rem:Main():int
        -169 (-49.85% of base) : 87210.dasm - DevDiv_710234:Test(short,int,bool,double,int,int):long
        -159 (-0.12% of base) : 190357.dasm - i4rem:Main():int
        -148 (-13.21% of base) : 85133.dasm - ILGEN_0x1759e5bf:Method_0x8c3e201b(long,ubyte,ubyte,float,ushort):int
        -136 (-12.34% of base) : 85073.dasm - ILGEN_0x86457e59:main():int

Top method regressions (percentages):
          70 (225.81% of base) : 87176.dasm - ILGEN_CLASS:ILGEN_METHOD(long,short,int):long
          10 (58.82% of base) : 87341.dasm - _simple:test():long
          84 (46.41% of base) : 87204.dasm - DevDiv_590772:ILGEN_METHOD(long,ushort,long,int,ushort,long):byte
          19 (38.78% of base) : 87141.dasm - C:M(bool,short,short,int):short
          17 (30.91% of base) : 87153.dasm - C:M():float
          14 (21.21% of base) : 245769.dasm - OVFTest:Test_sbyte():byte
          14 (20.00% of base) : 245771.dasm - OVFTest:Test_short():short
          11 (17.46% of base) : 245770.dasm - OVFTest:Test_byte():ubyte
          11 (17.46% of base) : 245772.dasm - OVFTest:Test_ushort():ushort
          25 (16.23% of base) : 86775.dasm - ILGEN_0x68eb95f0:Method_0x2dae(byte,int,long,byte,ushort,long):float
          56 (13.18% of base) : 87208.dasm - DevDiv_605447:ILGEN_METHOD(int,int,int,ushort,float):ushort
          13 (12.15% of base) : 86694.dasm - ILGEN_0x11c02e62:Method_0x7f2e741b(ubyte,ubyte,float,float,int):int
          20 (10.31% of base) : 87201.dasm - DevDiv_578217.Program:ILGEN_METHOD(ushort,float,long,ushort,ubyte,short,int):short
          16 ( 9.82% of base) : 86955.dasm - ILGEN_0x981b6a55:Method_0x9d35bca7():float
          29 ( 9.73% of base) : 86540.dasm - ILGEN_0x497ea5a8:main():int
           9 ( 8.82% of base) : 87166.dasm - ILGEN_CLASS:ILGEN_METHOD(double,ushort,int):bool
           3 ( 6.12% of base) : 235329.dasm - ConvTest:test4():long
           3 ( 6.12% of base) : 235395.dasm - ConvTest:test4():long
           4 ( 6.06% of base) : 82166.dasm - C:L(float):float
          27 ( 3.55% of base) : 85086.dasm - ILGEN_0x3aa9c940:main():int

Top method improvements (percentages):
        -104 (-94.55% of base) : 86541.dasm - ILGEN_0xa7db44c0:main():int
         -24 (-80.00% of base) : 87190.dasm - DevDiv_544983:Test(int):int
         -22 (-78.57% of base) : 86906.dasm - ILGEN_0x2693d3a2:Method_2(int):int
         -22 (-78.57% of base) : 86623.dasm - ILGEN_0x34e68074:Method_0x1f01(float):int
         -22 (-78.57% of base) : 86907.dasm - ILGEN_0x2693d3a2:Main():int
         -20 (-76.92% of base) : 86905.dasm - ILGEN_0x2693d3a2:Method_1(int):int
         -44 (-72.13% of base) : 87154.dasm - ILGEN_CLASS:ILGEN_METHOD(short,ushort,ushort,long):float
         -47 (-70.15% of base) : 86839.dasm - ILGEN_0xa09c7ca6:Method_0x1ff34cdc(long):int
         -22 (-64.71% of base) : 87072.dasm - FullProof:Test():int
         -74 (-60.16% of base) : 82168.dasm - C:Main():int
         -96 (-57.83% of base) : 87195.dasm - DevDiv_545497:ILGEN_METHOD(float,int,long):ubyte
          -8 (-57.14% of base) : 86492.dasm - ILGEN_622380794:main():int
         -55 (-56.70% of base) : 86470.dasm - ILGEN_622380794:main():int
         -88 (-56.05% of base) : 85256.dasm - FullProof:Test():int
        -100 (-51.28% of base) : 249447.dasm - Light:Main(System.String[]):int
         -12 (-50.00% of base) : 133943.dasm - testout1:Func_0_6_5_1_3():float
         -12 (-50.00% of base) : 136304.dasm - testout1:Func_0_6_5_1_3():float
        -169 (-49.85% of base) : 87210.dasm - DevDiv_710234:Test(short,int,bool,double,int,int):long
        -930 (-43.70% of base) : 249053.dasm - ILGEN_0x1d013582:Method_0x7ef63454():long
         -80 (-42.78% of base) : 86868.dasm - ILGEN_0x947e98be:Main():int

290 total methods with Code Size differences (256 improved, 34 regressed), 24 unchanged.

The regressions are because of how the pre-order traversal is performed in assertion propagation. Consider, for example, this tree: MUL(ADD_Ovf(Cast_Ovf(Constant_LCL_VAR)), Constant). What happens is that we now see (via the value number) that the ADD is actually constant, and start extracting side effects, which are based on gtFlags, which in turn have less precise information than value numbering exception sets (except in cases which we do not model - that's why I could not just use strip the GTF_EXCEPT flag in e. g. fgValueNumberTree - not all possible exceptions are recorded).

What this all means is that we now end up calling fgMorphBlockStmt with a tree that looks like this: COMMA(Cast_Ovf(Constant_LCL_VAR), Constant), where previously we would call morph on the whole tree after having propagated the value for the LCL_VAR. Naively continuing propagating constants in the side effect list fixes these regressions, but otherwise does not produce any diffs, so I decided it is not worth it. I would very much love to know what approach could be taken to eliminate this inefficiency however (manually walking the tree and checking for both side effects and propagation candidates, or a callback that gtExtractSideEffList would accept?), without introducing too much complexity.

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Apr 4, 2021

@vargaz, @BrzVlad I am pinging you as the owners of the codegen for Mono's AOT and interpreter. The test for checked casts that this PR introduces fails on both backends, with different behaviors:

  1. Interpreter, on WASM (reproduces with default Blazor app) - overflow exceptions are being thrown for a method like this one:
static void ConfirmSingleOneDecrementUnderSByteMinValueCastToSByteIsFoldedCorrectly()
{
    float singleOneDecrementUnderSByteMinValue = -128.00002f;

    if (BreakUpFlow())
        return;

    if (checked((sbyte)singleOneDecrementUnderSByteMinValue) != -128)
    {
        Console.WriteLine($"'(sbyte)-128.00002f' was evaluted to '{(sbyte)singleOneDecrementUnderSByteMinValue}'. Expected: '-128'.");
        _counter++;
    }
}
  1. LLVM AOT - no overflow exceptions are being thrown where they should be thrown:
static void ConfirmInt64MaxValueCastToInt64Overflows()
{
    float from = 9223372036854775807.0f;
    _counter++;
    try
    {
        _ = checked((long)from);
    }
    catch (OverflowException) { _counter--; }
    if (_counter != 100)
        Console.WriteLine("'checked((long)9223372036854775807.0f)' did not throw OverflowException.");
}

My current plan of action is to disable these tests on Mono and create an issue referencing them (since I do not have the skills or the environment required to properly debug it). Please let me know if that's not acceptable.

@BrzVlad
Copy link
Member

BrzVlad commented Apr 5, 2021

@SingleAccretion Sounds good to me

@SingleAccretion SingleAccretion force-pushed the Fold-Overflow-Operations-In-Value-Numbering branch from 52649d5 to 8733fa9 Compare April 5, 2021 16:38
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Apr 5, 2021

It looks like CI is not very happy today... Oh well.

Flipping this as ready to review. I expect there are many rough edges to this that will need to be smoothed out - some questionable places I pointed out in the comments here.

@SingleAccretion SingleAccretion marked this pull request as ready for review April 5, 2021 21:23
@AndyAyersMS
Copy link
Member

@briansull PTAL

cc @dotnet/jit-contrib

@AndyAyersMS AndyAyersMS requested a review from briansull April 9, 2021 16:21
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Apr 9, 2021
{
GenTree* thisOp = call->gtCallThisArg->GetNode();
GenTree* flagOp = call->gtCallArgs->GetNode();
GenTree* result = gtOptimizeEnumHasFlag(thisOp, flagOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new code optimization that you are adding. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without it, the (new) cases in tests were not getting folded. That said, I did not find any diffs for this in the framework, so I would be 100% fine with backing this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note as well: there are some DIV and MOD helpers that could be folded here too, so that cases like these would be optimized on 32 bit. But long divison / modulus are also very rare (may become not as rare if BigInteger is moved to use long internally).

src/coreclr/jit/gentree.cpp Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Show resolved Hide resolved
default:
unreached();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with trying to do this kind of folding optimization, is that we support cross compilation, so this code has to be host independent and can't have different behaviors on different targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and I believe this logic to be IEEE-defined (and thus host / Jit's compiler-independent, modulo compliance bugs of course). cc @tannergooding

Copy link
Member

@tannergooding tannergooding Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most simple operations have a strict IEEE definition, which is that the operation is computed as if to infinite precision and unbounded range and then rounded to the nearest representable result.
Where hardware directly supports the operation, it is generally IEEE compliant (with a few exceptions, like maxss/minss on x86; ARM64 however has compliant implementations).

This includes addition, subtraction, multiplication, division, negation, and conversions at the very least.

IEEE has a different definition for remainder than we do (hence Math.IEEERemainder). Likewise, there are some edge cases that are undefined behavior such as converting a float to an integral where the float is outside the bounds of the target integral (e.g. float f = long.MaxValue; int i = (int)f; is UB since f is greater than int.MaxValue)

We also have some bugs, such as in our implementation of ulong to double on x86 since it isn't implemented in hardware.

That being said, most of these edge cases are being fixed/normalized as they cause problems when comparing x86/x64 to ARM64. C# was already doing this for its own determinism needs when constant folding.

@briansull
Copy link
Contributor

Jit-format needs to be run on this change:

Formatting windows x64 - Failing after 5m

@SingleAccretion SingleAccretion force-pushed the Fold-Overflow-Operations-In-Value-Numbering branch from 8733fa9 to adc042d Compare April 12, 2021 23:41
@SingleAccretion
Copy link
Contributor Author

Jit-format needs to be run on this change:
Formatting windows x64 - Failing after 5m

@briansull should be fixed now.

@briansull
Copy link
Contributor

briansull commented Apr 13, 2021

I have just approved this change
Anyone else want to take a look at this rather large change

cc @dotnet/jit-contrib

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants